Research by
Michael Siebel

Table of Contents

Bad Banking Behavior
Analyzing Bank Mortgage during the 2008 Housing Bubble

Michael Siebel
December 2020


Model Predictions Script


Objectives


Below contains information on five features: credit score, debt-to-income ratio, loan-to-value ratio, median household income at the 3-digit zip code-level, and the dollar amount change in mortgage loans made 1 year ago and 5 years ago. The latter feature was created by taking total loan amount during a fiscal year quarter for each bank within a 3-digit zip code.

Using saved models, each of the five features were one-by-one replaced by an improved and a weakened assumption based on the inter-quartile range (25-75 percentiles) of the feature across all banks and predicted probabilities were generated to see what the expected foreclosure rate would be if each banks’ behavior were different. For example, a high credit score is associated with fewer foreclosures. Among all banks, the average credit score was 719 (on a scale of 300 to 850). I modified the credit score at each bank to the 75th percentile—an improved assumption of a 770 credit score—and to the 25th percentile—a weakened assumption of 675 credit score. I left all other feature values unchanged. I ran these values through the saved model detailed in the section above and analyzed the change in foreclosure rates. One can interpret the findings as: “If GMAC Mortgage only lent to those with a credit score of 770, with all other considerations staying the same, its foreclosure rate is predicted to fall from 9.7% to 1%.”


Load Functions

In [1]:
# Load functions
%run Functions.ipynb
pd.set_option("display.max_columns", 200)
pd.set_option('display.max_rows', 200)

# Load data
file_to_open = open('..\Data\Pickle\df.pkl', 'rb') 
df  = pickle.load(file_to_open) 
file_to_open.close()

# Drop mergeID column
df = df.drop(labels='Loan ID', axis=1)

# Convert Inf values to NA
df = df.replace([np.inf, -np.inf], np.nan)
Using TensorFlow backend.
In [2]:
## Bank and Classifier Lists
banks = ['Bank of America','Wells Fargo Bank','CitiMortgage',
         'JPMorgan Chase','GMAC Mortgage','SunTrust Mortgage',
         'AmTrust Bank','PNC Bank','Flagstar Bank']

banks_plus = banks + ['All Banks']
clfs_str = ['RFC', 'RFC PCA', 'RUS Boost'] 

# Rename Columns
df = df.rename(columns={"Original Combined Loan-to-Value (CLTV)": "Loan-to-Value (LTV)", 
                        "Original Debt to Income Ratio": "Debt-to-Income",
                        "Loan Change (1 Year)": "Loan Change (1 Yr)",
                        "Loan Change (5 Years)": "Loan Change (5 Yr)",
                        "Lnlsnet (1 Yr)": "Loan Liabilities (1 Yr)",
                        "Lnlsnet (5 Yr)": "Loan Liabilities (5 Yr)"})

## Create an environment variable to avoid using the GPU. This can be changed.
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

Data Wrangling

In [3]:
# Verify Bank Counts
df['Bank'].value_counts()
Out[3]:
Bank of America      650087
CitiMortgage         260698
Wells Fargo Bank     214039
JPMorgan Chase       202997
GMAC Mortgage        178160
SunTrust Mortgage    141398
PNC Bank             100351
AmTrust Bank          79360
Flagstar Bank         66637
Name: Bank, dtype: int64
In [4]:
# Variables to drop
dropvars = ['File Year', 'Year', 'Month', 'Region', 'FIPS',
            'Zip Code', 'Mortgage Insurance Type', 'Property State',
            'First Payment', 'Original Loan-to-Value (LTV)']
df = df.drop(labels=dropvars, axis=1)
df = df.filter(regex=r'^(?!Asset).*$')
df = df.filter(regex=r'^(?!Liab).*$')
df = df.filter(regex=r'^(?!Eqtot).*$')
df = df.filter(regex=r'^(?!Dep).*$')

# Convert Original Date to Numeric
df['Reported Period'] = df['Reported Period'].astype(float).astype(int).astype(str)
df['Reported Period'] = df['Reported Period'].apply(lambda x: x.zfill(6))
df['Reported Period'] = df['Reported Period'].map(lambda x: x[:2] + '/' + x[2:])
df = change_date(df, 'Reported Period')
df = change_date(df, 'Original Date')

# Missingness to drop
df = df.dropna()

# All data
y_all = df['Foreclosed']
X_all = df.drop(labels=['Foreclosed', 'Zero Balance Code'], axis=1) 

# Split Train (70%)
X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, test_size = 0.7, 
                                                    stratify = y_all, random_state=2019)
# Split Val (15%) and Test (15%)
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, test_size = 0.5, 
                                                stratify = y_test, random_state=2019)

# One hot encoding on remaining data
Bnk_train = X_train['Bank'].reset_index().iloc[:,1]
X_train = onehotencoding(X_train)
Bnk_val = X_val['Bank'].reset_index().iloc[:,1]
X_val = onehotencoding(X_val)
Bnk_test = X_test['Bank'].reset_index().iloc[:,1]
X_test = onehotencoding(X_test)
In [5]:
# Update Macroeconomic variables (will not use test set)
X_train, X_val, X_test = pca_fred(X_train, X_val, X_test)

# Check columns
X_train.columns
Out[5]:
Index(['Reported Period', 'Original Interest Rate', 'Original Mortgage Amount',
       'Original Loan Term', 'Original Date', 'Loan-to-Value (LTV)',
       'Single Borrower', 'Debt-to-Income', 'Loan Purpose', 'Number of Units',
       'Mortgage Insurance %', 'Credit Score', 'Loan Change (1 Yr)',
       'Loan Change (5 Yr)', 'Median Household Income', 'Number of Employees',
       'Loan Liabilities (5 Yr)', 'Loan Liabilities (1 Yr)',
       'Origination Channel_B', 'Origination Channel_C',
       'Origination Channel_R', 'Bank_AmTrust Bank', 'Bank_Bank of America',
       'Bank_CitiMortgage', 'Bank_Flagstar Bank', 'Bank_GMAC Mortgage',
       'Bank_JPMorgan Chase', 'Bank_PNC Bank', 'Bank_SunTrust Mortgage',
       'Bank_Wells Fargo Bank', 'First Time Home Buyer_N',
       'First Time Home Buyer_Y', 'Property Type_CO', 'Property Type_CP',
       'Property Type_MH', 'Property Type_PU', 'Property Type_SF',
       'Occupancy Type_I', 'Occupancy Type_P', 'Occupancy Type_S',
       'Relocation Mortgage Indicator_N', 'Relocation Mortgage Indicator_Y',
       'File Quarter_Q1', 'File Quarter_Q2', 'File Quarter_Q3',
       'File Quarter_Q4', 'Macroeconomy PCA 1', 'Macroeconomy PCA 2',
       'Macroeconomy PCA 3', 'Macroeconomy PCA 4', 'Macroeconomy PCA 5'],
      dtype='object')
In [6]:
# List of banks
banks = ['Bank of America','Wells Fargo Bank','CitiMortgage',
         'JPMorgan Chase','GMAC Mortgage','SunTrust Mortgage',
         'AmTrust Bank','PNC Bank','Flagstar Bank']

# Run Function
Banks_X, Banks_y = Bank_Subsets(banks, df_X = X_train, df_y = y_train)
Banks_X_val, Banks_y_val = Bank_Subsets(banks, df_X = X_val, df_y = y_val)
Banks_X_test, Banks_y_test = Bank_Subsets(banks, df_X = X_test, df_y = y_test)
X_train = X_train.filter(regex=r'^(?!Bank).*$')
X_val = X_val.filter(regex=r'^(?!Bank).*$')
X_test = X_test.filter(regex=r'^(?!Bank).*$')

# All Banks
Banks_y['All Banks'] = y_train
Banks_X['All Banks'] = X_train
Banks_y_val['All Banks'] = y_val
Banks_X_val['All Banks'] = X_val
Banks_y_test['All Banks'] = y_test
Banks_X_test['All Banks'] = X_test

print('Shape:', X_train.shape)
Shape: (483564, 42)

Improved Assumptions

In [7]:
# Loading models
file_to_open = open('..\Data\Pickle\models.pkl', 'rb') 
vote_models = pickle.load(file_to_open) 
file_to_open.close()

# Loading Thresholds
file_to_open = open('..\Data\Pickle\model_thresholds.pkl', 'rb') 
vote_thresholds = pickle.load(file_to_open) 
file_to_open.close()
In [8]:
# Combine Train, Validation, and Testing Data
X = pd.concat([X_train, X_val, X_test], axis=0).reset_index().iloc[:,1:]
y = pd.concat([y_train, y_val, y_test], axis=0).reset_index().iloc[:,1]
bank_idx = pd.concat([Bnk_train, Bnk_val, Bnk_test], axis=0).reset_index().iloc[:,1]

# Initiate Dictionaries
better = {}
better_value = {}
best = {}
best_value = {}

worse = {}
worse_value = {}
worst = {}
worst_value = {}
In [9]:
# Credit Score
print('Credit Score Distribution')
print(X['Credit Score'].describe().round(0))
print('')
better['Credit Score'], \
better_value['Credit Score'] = changing_assumptions(
    'Credit Score', 75, 
    banks, bank_idx, X,
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Credit Score Distribution
count    1611881.0
mean         719.0
std           59.0
min          330.0
25%          675.0
50%          724.0
75%          770.0
max          850.0
Name: Credit Score, dtype: float64

Converting Credit Score to the 75 percentile: 770.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 1.9 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 1.2 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 1.0 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 1.6 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 1.0 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 1.6 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 2.8 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 3.1 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 2.1 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 1.6 %
In [10]:
# Debt-to-Income
print('Debt-to-Income Distribution')
print(X['Debt-to-Income'].describe().round(0))
print('')
better['Debt-to-Income'], \
better_value['Debt-to-Income'] = changing_assumptions('Debt-to-Income', 25, 
                                  banks, bank_idx, X, 
                                  vote_models, vote_thresholds, 
                                  Banks_X, Banks_X_val, Banks_X_test,
                                  Banks_y, Banks_y_val, Banks_y_test)
Debt-to-Income Distribution
count    1611881.0
mean          38.0
std           12.0
min            0.0
25%           29.0
50%           38.0
75%           47.0
max           64.0
Name: Debt-to-Income, dtype: float64

Converting Debt-to-Income to the 25 percentile: 29.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 8.9 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 3.5 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 5.4 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 5.5 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 5.0 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 5.4 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 5.5 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 5.2 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 5.3 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 6.5 %
In [11]:
# Loan to Value
print('Loan-to-Value Distribution')
print(X['Loan-to-Value (LTV)'].describe().round(0))
print('')
better['Loan-to-Value (LTV)'], \
better_value['Loan-to-Value (LTV)'] = changing_assumptions(
    'Loan-to-Value (LTV)', 25, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Loan-to-Value Distribution
count    1611881.0
mean          72.0
std           18.0
min            1.0
25%           63.0
50%           78.0
75%           84.0
max          154.0
Name: Loan-to-Value (LTV), dtype: float64

Converting Loan-to-Value (LTV) to the 25 percentile: 63.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 7.3 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 3.2 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 3.8 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 3.4 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 5.4 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 3.4 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 4.3 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 5.0 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 5.9 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 5.3 %
In [12]:
# Median Household Income
print('Median Household Income Distribution')
print(X['Median Household Income'].describe().round(2))
print('')
better['Median Household Income'], \
better_value['Median Household Income'] = changing_assumptions(
    'Median Household Income', 75, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Median Household Income Distribution
count    1611881.00
mean       48740.22
std         8623.32
min        28831.94
25%        43298.12
50%        46017.38
75%        53615.10
max       101651.30
Name: Median Household Income, dtype: float64

Converting Median Household Income to the 75 percentile: 53615.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 14.4 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 8.4 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 8.9 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 8.2 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 10.3 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 8.9 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 8.7 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 11.3 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 11.5 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 11.2 %
In [13]:
# Median Household Income (Best Assumption)
print('Median Household Income Distribution')
print(X['Median Household Income'].describe().round(2))
print('')
best['Median Household Income'], \
best_value['Median Household Income'] = changing_assumptions(
    'Median Household Income', 100, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Median Household Income Distribution
count    1611881.00
mean       48740.22
std         8623.32
min        28831.94
25%        43298.12
50%        46017.38
75%        53615.10
max       101651.30
Name: Median Household Income, dtype: float64

Converting Median Household Income to the 100 percentile: 101651.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 6.5 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 4.7 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 3.5 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 3.4 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 3.2 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 4.3 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 3.2 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 3.3 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 5.1 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 4.9 %
In [14]:
# Loan Change
print('Loan Change (1 Yr) Distribution')
print(X['Loan Change (1 Yr)'].describe().round(2))
print('')
print('Loan Change (5 Yr) Distribution')
print(X['Loan Change (5 Yr)'].describe().round(2))
print('')
better['Loan Change (1 Yr)'], \
better_value['Loan Change (1 Yr)']= changing_assumptions(
    ['Loan Change (1 Yr)', 'Loan Change (5 Yr)'], [25, 25], 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Loan Change (1 Yr) Distribution
count    1611881.00
mean       15266.28
std        24070.98
min      -450000.00
25%         2291.87
50%        14106.36
75%        27660.71
max       394000.00
Name: Loan Change (1 Yr), dtype: float64

Loan Change (5 Yr) Distribution
count    1611881.00
mean       62087.90
std        35183.64
min      -244333.33
25%        37980.10
50%        61000.00
75%        86357.61
max       522500.00
Name: Loan Change (5 Yr), dtype: float64

Converting Loan Change (1 Yr) to the 25 percentile: 2292.0

Converting Loan Change (5 Yr) to the 25 percentile: 37980.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 11.1 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 3.6 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 5.4 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 4.2 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 7.0 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 6.7 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 9.9 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 8.7 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 13.6 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 7.9 %
In [15]:
# Loan Change
print('Loan Change (1 Yr) Distribution')
print(X['Loan Change (1 Yr)'].describe().round(2))
print('')
print('Loan Change (5 Yr) Distribution')
print(X['Loan Change (5 Yr)'].describe().round(2))
print('')
better['Loan Change (5 Yr)'], \
better_value['Loan Change (5 Yr)']= changing_assumptions(
    ['Loan Change (1 Yr)', 'Loan Change (5 Yr)'], [25, 25], 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Loan Change (1 Yr) Distribution
count    1611881.00
mean       15266.28
std        24070.98
min      -450000.00
25%         2291.87
50%        14106.36
75%        27660.71
max       394000.00
Name: Loan Change (1 Yr), dtype: float64

Loan Change (5 Yr) Distribution
count    1611881.00
mean       62087.90
std        35183.64
min      -244333.33
25%        37980.10
50%        61000.00
75%        86357.61
max       522500.00
Name: Loan Change (5 Yr), dtype: float64

Converting Loan Change (1 Yr) to the 25 percentile: 2292.0

Converting Loan Change (5 Yr) to the 25 percentile: 37980.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 11.1 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 3.6 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 5.4 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 4.2 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 7.0 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 6.7 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 9.9 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 8.7 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 13.6 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 7.9 %
In [16]:
# Bank Loan Liabilities
print('Bank Loan Liabilities (1 Year) Distribution')
print(X['Loan Liabilities (1 Yr)'].describe().round(2))
print('')
print('Bank Loan Liabilities (5 Years) Distribution')
print(X['Loan Liabilities (5 Yr)'].describe().round(2))
print('')
better['Loan Liabilities (1 Yr)'], \
better_value['Loan Liabilities (1 Yr)'] = changing_assumptions(
    ['Loan Liabilities (1 Yr)', 'Loan Liabilities (5 Yr)'], [25, 25],
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Bank Loan Liabilities (1 Year) Distribution
count    1611881.00
mean         169.96
std         1003.32
min           -0.98
25%            0.96
50%            1.05
75%            2.19
max        10104.83
Name: Loan Liabilities (1 Yr), dtype: float64

Bank Loan Liabilities (5 Years) Distribution
count    1611881.00
mean         310.14
std         1375.71
min           -0.99
25%            0.97
50%            1.03
75%            2.64
max        16357.85
Name: Loan Liabilities (5 Yr), dtype: float64

Converting Loan Liabilities (1 Yr) to the 25 percentile: 1.0

Converting Loan Liabilities (5 Yr) to the 25 percentile: 1.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 13.2 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 6.7 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 7.6 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 7.7 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 9.0 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 9.1 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 9.0 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 7.2 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 8.8 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 10.0 %
In [17]:
# Bank Loan Liabilities (Best Assumption)
print('Bank Loan Liabilities (1 Year) Distribution')
print(X['Loan Liabilities (1 Yr)'].describe().round(2))
print('')
print('Bank Loan Liabilities (5 Years) Distribution')
print(X['Loan Liabilities (5 Yr)'].describe().round(2))
print('')
best['Loan Liabilities (1 Yr)'], \
best_value['Loan Liabilities (1 Yr)'] = changing_assumptions(
    ['Loan Liabilities (1 Yr)', 'Loan Liabilities (5 Yr)'], [100, 100],
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Bank Loan Liabilities (1 Year) Distribution
count    1611881.00
mean         169.96
std         1003.32
min           -0.98
25%            0.96
50%            1.05
75%            2.19
max        10104.83
Name: Loan Liabilities (1 Yr), dtype: float64

Bank Loan Liabilities (5 Years) Distribution
count    1611881.00
mean         310.14
std         1375.71
min           -0.99
25%            0.97
50%            1.03
75%            2.64
max        16357.85
Name: Loan Liabilities (5 Yr), dtype: float64

Converting Loan Liabilities (1 Yr) to the 100 percentile: 10105.0

Converting Loan Liabilities (5 Yr) to the 100 percentile: 16358.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 0.3 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 0.2 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 0.2 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 0.3 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 0.1 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 5.2 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 0.1 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 0.1 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 0.2 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 0.4 %

In [18]:
# Save improved assumptions
data = [better, better_value, best, best_value]
with open("..\Data\Pickle\pred_votes_improved.pkl", "wb") as f:
    pickle.dump(data, f)

Weakened Assumptions

In [19]:
# Credit Score
print('Credit Score Distribution')
print(X['Credit Score'].describe().round(0))
print('')
worse['Credit Score'], \
worse_value['Credit Score'] = changing_assumptions(
    'Credit Score', 25, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Credit Score Distribution
count    1611881.0
mean         719.0
std           59.0
min          330.0
25%          675.0
50%          724.0
75%          770.0
max          850.0
Name: Credit Score, dtype: float64

Converting Credit Score to the 25 percentile: 675.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 17.3 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 11.9 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 11.3 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 10.9 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 12.2 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 13.0 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 12.3 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 13.8 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 15.8 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 14.1 %
In [20]:
# Debt-to-Income
print('Debt-to-Income Distribution')
print(X['Debt-to-Income'].describe().round(0))
print('')
worse['Debt-to-Income'], \
worse_value['Debt-to-Income'] = changing_assumptions(
    'Debt-to-Income', 75, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Debt-to-Income Distribution
count    1611881.0
mean          38.0
std           12.0
min            0.0
25%           29.0
50%           38.0
75%           47.0
max           64.0
Name: Debt-to-Income, dtype: float64

Converting Debt-to-Income to the 75 percentile: 47.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 14.8 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 9.4 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 9.7 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 8.9 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 11.0 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 9.3 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 9.9 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 12.5 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 11.9 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 11.8 %
In [21]:
# Loan-to-Value
print('Loan-to-Value Distribution')
print(X['Loan-to-Value (LTV)'].describe().round(0))
print('')
worse['Loan-to-Value (LTV)'], \
worse_value['Loan-to-Value (LTV)'] = changing_assumptions(
    'Loan-to-Value (LTV)', 75, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Loan-to-Value Distribution
count    1611881.0
mean          72.0
std           18.0
min            1.0
25%           63.0
50%           78.0
75%           84.0
max          154.0
Name: Loan-to-Value (LTV), dtype: float64

Converting Loan-to-Value (LTV) to the 75 percentile: 84.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 16.4 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 9.9 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 10.8 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 9.3 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 14.4 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 9.8 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 9.8 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 12.6 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 13.1 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 13.2 %
In [22]:
# Median Household Income
print('Median Household Income Distribution')
print(X['Median Household Income'].describe().round(2))
print('')
worse['Median Household Income'], \
worse_value['Median Household Income'] = changing_assumptions(
    'Median Household Income', 25, 
                                    banks, bank_idx, X, 
                                    vote_models, vote_thresholds, 
                                    Banks_X, Banks_X_val, Banks_X_test,
                                    Banks_y, Banks_y_val, Banks_y_test
)
Median Household Income Distribution
count    1611881.00
mean       48740.22
std         8623.32
min        28831.94
25%        43298.12
50%        46017.38
75%        53615.10
max       101651.30
Name: Median Household Income, dtype: float64

Converting Median Household Income to the 25 percentile: 43298.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 13.3 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 8.7 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 7.7 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 7.4 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 9.2 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 8.1 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 10.2 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 9.9 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 10.4 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 10.4 %
In [23]:
# Median Household Income (Worst Assumption)
print('Median Household Income Distribution')
print(X['Median Household Income'].describe().round(2))
print('')
worst['Median Household Income'], \
worst_value['Median Household Income'] = changing_assumptions(
    'Median Household Income', 0, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Median Household Income Distribution
count    1611881.00
mean       48740.22
std         8623.32
min        28831.94
25%        43298.12
50%        46017.38
75%        53615.10
max       101651.30
Name: Median Household Income, dtype: float64

Converting Median Household Income to the 0 percentile: 28832.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 12.9 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 7.4 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 6.7 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 7.3 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 10.5 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 8.6 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 4.4 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 10.2 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 10.7 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 9.9 %
In [24]:
# Loan Change
print('Loan Change (1 Yr) Distribution')
print(X['Loan Change (1 Yr)'].describe().round(2))
print('')
print('Loan Change (5 Yr) Distribution')
print(X['Loan Change (5 Yr)'].describe().round(2))
print('')
worse['Loan Change (1 Yr)'], \
worse_value['Loan Change (1 Yr)'] = changing_assumptions(
    ['Loan Change (1 Yr)', 'Loan Change (5 Yr)'], [75, 75], 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Loan Change (1 Yr) Distribution
count    1611881.00
mean       15266.28
std        24070.98
min      -450000.00
25%         2291.87
50%        14106.36
75%        27660.71
max       394000.00
Name: Loan Change (1 Yr), dtype: float64

Loan Change (5 Yr) Distribution
count    1611881.00
mean       62087.90
std        35183.64
min      -244333.33
25%        37980.10
50%        61000.00
75%        86357.61
max       522500.00
Name: Loan Change (5 Yr), dtype: float64

Converting Loan Change (1 Yr) to the 75 percentile: 27661.0

Converting Loan Change (5 Yr) to the 75 percentile: 86358.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 17.0 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 6.4 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 10.7 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 11.0 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 11.4 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 10.1 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 13.6 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 13.6 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 13.8 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 12.9 %
In [25]:
# Loan Change
print('Loan Change (1 Yr) Distribution')
print(X['Loan Change (1 Yr)'].describe().round(2))
print('')
print('Loan Change (5 Yr) Distribution')
print(X['Loan Change (5 Yr)'].describe().round(2))
print('')
worse['Loan Change (5 Yr)'], \
worse_value['Loan Change (5 Yr)'] = changing_assumptions(
    ['Loan Change (1 Yr)', 'Loan Change (5 Yr)'], [75, 75], 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Loan Change (1 Yr) Distribution
count    1611881.00
mean       15266.28
std        24070.98
min      -450000.00
25%         2291.87
50%        14106.36
75%        27660.71
max       394000.00
Name: Loan Change (1 Yr), dtype: float64

Loan Change (5 Yr) Distribution
count    1611881.00
mean       62087.90
std        35183.64
min      -244333.33
25%        37980.10
50%        61000.00
75%        86357.61
max       522500.00
Name: Loan Change (5 Yr), dtype: float64

Converting Loan Change (1 Yr) to the 75 percentile: 27661.0

Converting Loan Change (5 Yr) to the 75 percentile: 86358.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 17.0 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 6.4 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 10.7 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 11.0 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 11.4 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 10.1 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 13.6 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 13.6 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 13.8 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 12.9 %
In [26]:
# Bank Loan Liabilities
print('Bank Loan Liabilities (1 Year) Distribution')
print(X['Loan Liabilities (1 Yr)'].describe().round(2))
print('')
print('Bank Loan Liabilities (5 Years) Distribution')
print(X['Loan Liabilities (5 Yr)'].describe().round(2))
print('')
worse['Loan Liabilities (1 Yr)'], \
worse_value['Loan Liabilities (1 Yr)'] = changing_assumptions(
    ['Loan Liabilities (1 Yr)', 'Loan Liabilities (5 Yr)'], [75, 75],
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Bank Loan Liabilities (1 Year) Distribution
count    1611881.00
mean         169.96
std         1003.32
min           -0.98
25%            0.96
50%            1.05
75%            2.19
max        10104.83
Name: Loan Liabilities (1 Yr), dtype: float64

Bank Loan Liabilities (5 Years) Distribution
count    1611881.00
mean         310.14
std         1375.71
min           -0.99
25%            0.97
50%            1.03
75%            2.64
max        16357.85
Name: Loan Liabilities (5 Yr), dtype: float64

Converting Loan Liabilities (1 Yr) to the 75 percentile: 2.0

Converting Loan Liabilities (5 Yr) to the 75 percentile: 3.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 11.0 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 7.8 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 3.8 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 7.2 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 8.2 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 8.6 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 8.5 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 5.2 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 6.1 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 8.4 %
In [27]:
# Bank Loan Liabilities (Worst Assumption)
print('Bank Loan Liabilities (1 Year) Distribution')
print(X['Loan Liabilities (1 Yr)'].describe().round(2))
print('')
print('Bank Loan Liabilities (5 Years) Distribution')
print(X['Loan Liabilities (5 Yr)'].describe().round(2))
print('')
worst['Loan Liabilities (1 Yr)'], \
worst_value['Loan Liabilities (1 Yr)'] = changing_assumptions(
    ['Loan Liabilities (1 Yr)', 'Loan Liabilities (5 Yr)'], [0, 0],
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)
Bank Loan Liabilities (1 Year) Distribution
count    1611881.00
mean         169.96
std         1003.32
min           -0.98
25%            0.96
50%            1.05
75%            2.19
max        10104.83
Name: Loan Liabilities (1 Yr), dtype: float64

Bank Loan Liabilities (5 Years) Distribution
count    1611881.00
mean         310.14
std         1375.71
min           -0.99
25%            0.97
50%            1.03
75%            2.64
max        16357.85
Name: Loan Liabilities (5 Yr), dtype: float64

Converting Loan Liabilities (1 Yr) to the 0 percentile: -1.0

Converting Loan Liabilities (5 Yr) to the 0 percentile: -1.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 10.6 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 5.9 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 5.5 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 6.6 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 13.3 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 10.0 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 10.6 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 11.1 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 15.7 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 9.2 %
In [28]:
# Save weakened assumptions
data = [worse, worse_value, worst, worst_value]
with open("..\Data\Pickle\pred_votes_weakened.pkl", "wb") as f:
    pickle.dump(data, f)